refactor(agent-service): redesign sync-execution result and error model by bobbai00 · Pull Request #6009 · apache/texera

bobbai00 · 2026-06-29T00:41:29Z

What changes were proposed in this PR?

Restructures the per-operator summary the sync-execution backend returns and the agent-service / frontend consume, for a leaner, consistent wire contract. This is a focused re-do of #5927 cut directly from main (no foundation stack): it changes only the execution result/error model and its consumers.

Replace the flat OperatorInfo with OperatorExecutionSummary (orthogonal sub-summaries: state, errorMessages, resultSummary?, consoleLogsSummary?); rename SyncExecutionResult → WorkflowExecutionSummary.
resultSummary.sampleTuples is now SampleRow[] ({ rowIndex, tuple }) instead of JSON rows with an embedded __row_index__; drop the table-shape types (the agent derives input-port shapes from the DAG).
Move WorkflowFatalError into types/execution.ts and reuse it for per-operator errors — the same type the workflow-compiling service returns for compilation errors, so compile and execution errors share one wire shape; api/compile-api.ts re-exports it so its existing importers are unchanged.
errorMessages / errors are non-optional (empty = none); drop compilationErrors; collapse the console-message types and derive warnings from WARNING:-titled messages.
Operator results are still pulled on demand via GET /agents/:id/operator-results (transport unchanged); that REST payload now carries the canonical OperatorExecutionSummary, and the frontend maps it to its flat display type (re-flattening sampleTuples so the display components are unchanged).

Touches the Scala producer (SyncExecutionResource), the agent-service consumers (result-formatting, workflow-execution-tools, workflow-result-state, server), and the frontend mapping. Representation/type-level; behavior preserved (input-port shape lines are now derived rather than explicitly rendered).

Any related issues, documentation, discussions?

Closes #5750
Part of #5747.
Supersedes #5927.

How was this PR tested?

agent-service: tsc --noEmit clean, bun test 110/110 pass, prettier --check clean.
The Scala producer (SyncExecutionResource) is unchanged from refactor(agent-service): redesign sync-execution result and error model #5927, which verified it via sbt WorkflowExecutionService/compile and a full-stack end-to-end run (a Claude Haiku 4.5 agent built and executed a CSV workflow; /operator-results returned the new shape — resultSummary.sampleTuples: [{ rowIndex, tuple }], errorMessages: []).

Was this PR authored or co-authored using generative AI tooling?

Generated-by: Claude Opus 4.8 (1M context)

github-actions · 2026-06-29T00:41:46Z

Automated Reviewer Suggestions

Based on the git blame history of the changed files, we recommend the following reviewers:

Contributors with relevant context: @Ma77Ball, @Yicong-Huang
You can notify them by mentioning @Ma77Ball, @Yicong-Huang in a comment.

codecov-commenter · 2026-06-29T00:42:50Z

Codecov Report

❌ Patch coverage is 86.77419% with 41 lines in your changes missing coverage. Please review.
✅ Project coverage is 58.30%. Comparing base (878eb8a) to head (e411b54).
⚠️ Report is 2 commits behind head on main.

Files with missing lines	Patch %	Lines
...he/texera/web/resource/SyncExecutionResource.scala	75.00%	27 Missing and 8 partials ⚠️
...agent-interaction/agent-interaction.component.html	0.00%	4 Missing and 2 partials ⚠️

Additional details and impacted files

@@             Coverage Diff              @@
##               main    #6009      +/-   ##
============================================
+ Coverage     56.37%   58.30%   +1.92%     
- Complexity     2992     3094     +102     
============================================
  Files          1129     1130       +1     
  Lines         43802    43624     -178     
  Branches       4743     4731      -12     
============================================
+ Hits          24695    25436     +741     
+ Misses        17658    16744     -914     
+ Partials       1449     1444       -5

Flag	Coverage Δ		*Carryforward flag
access-control-service	`70.00% <ø> (ø)`		Carriedforward from 2b6156f
agent-service	`57.37% <100.00%> (+12.77%)`	⬆️
amber	`59.57% <75.00%> (+2.19%)`	⬆️
computing-unit-managing-service	`0.00% <ø> (ø)`		Carriedforward from 2b6156f
config-service	`52.30% <ø> (ø)`		Carriedforward from 2b6156f
file-service	`62.81% <ø> (ø)`		Carriedforward from 2b6156f
frontend	`50.38% <89.09%> (+0.31%)`	⬆️
notebook-migration-service	`78.57% <ø> (ø)`		Carriedforward from 2b6156f
pyamber	`91.15% <ø> (ø)`		Carriedforward from 2b6156f
python	`90.69% <ø> (ø)`		Carriedforward from 2b6156f
workflow-compiling-service	`55.14% <ø> (ø)`		Carriedforward from 2b6156f

*This pull request uses carry forward flags. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

github-actions · 2026-06-29T00:47:27Z

⚠️ Benchmark changes need a look

🟢 0 better · 🔴 3 worse · ⚪ 12 noise (<±5%) · 0 without baseline

Compared against main 24b587f benchmarked on this same runner, so the delta is largely free of cross-runner hardware noise. The "7d avg" column still reflects the gh-pages dashboard. Treat <±5% as noise unless repeated.

Dashboard · Run

	config	throughput	MB/s	latency	max Δ latest / 7d
🔴	bs=10 sw=10 sl=64	393	0.24	24,465/34,026/34,026 us	🔴 +9.1% / 🔴 +130.2%
⚪	bs=100 sw=10 sl=64	793	0.484	124,891/146,123/146,123 us	⚪ within ±5% / 🔴 +36.7%
⚪	bs=1000 sw=10 sl=64	917	0.56	1,089,500/1,120,580/1,120,580 us	⚪ within ±5% / 🔴 +10.7%

Baseline details

Latest main 24b587f from same runner

config	metric	PR	latest main	7d avg	Δ latest	Δ 7d
bs=10 sw=10 sl=64	throughput	393 tuples/sec	422 tuples/sec	786.27 tuples/sec	-6.9%	-50.0%
bs=10 sw=10 sl=64	MB/s	0.24 MB/s	0.257 MB/s	0.48 MB/s	-6.6%	-50.0%
bs=10 sw=10 sl=64	p50	24,465 us	22,427 us	12,495 us	+9.1%	+95.8%
bs=10 sw=10 sl=64	p95	34,026 us	34,979 us	14,784 us	-2.7%	+130.2%
bs=10 sw=10 sl=64	p99	34,026 us	34,979 us	18,468 us	-2.7%	+84.2%
bs=100 sw=10 sl=64	throughput	793 tuples/sec	793 tuples/sec	991.49 tuples/sec	0.0%	-20.0%
bs=100 sw=10 sl=64	MB/s	0.484 MB/s	0.484 MB/s	0.605 MB/s	0.0%	-20.0%
bs=100 sw=10 sl=64	p50	124,891 us	121,930 us	100,929 us	+2.4%	+23.7%
bs=100 sw=10 sl=64	p95	146,123 us	150,031 us	106,894 us	-2.6%	+36.7%
bs=100 sw=10 sl=64	p99	146,123 us	150,031 us	114,085 us	-2.6%	+28.1%
bs=1000 sw=10 sl=64	throughput	917 tuples/sec	914 tuples/sec	1,023 tuples/sec	+0.3%	-10.4%
bs=1000 sw=10 sl=64	MB/s	0.56 MB/s	0.558 MB/s	0.624 MB/s	+0.4%	-10.3%
bs=1000 sw=10 sl=64	p50	1,089,500 us	1,094,222 us	983,835 us	-0.4%	+10.7%
bs=1000 sw=10 sl=64	p95	1,120,580 us	1,119,531 us	1,023,777 us	+0.1%	+9.5%
bs=1000 sw=10 sl=64	p99	1,120,580 us	1,119,531 us	1,053,883 us	+0.1%	+6.3%

Raw CSV

config_idx,batch_size,schema_width,string_len,num_batches,total_ms,total_tuples,total_bytes,tuples_per_sec,mb_per_sec,lat_p50_us,lat_p95_us,lat_p99_us
0,10,10,64,20,509.49,200,128000,393,0.240,24465.12,34025.70,34025.70
1,100,10,64,20,2520.67,2000,1280000,793,0.484,124890.88,146123.18,146123.18
2,1000,10,64,20,21805.37,20000,12800000,917,0.560,1089499.77,1120579.96,1120579.96

### What changes were proposed in this PR? Restructures the per-operator summary the sync-execution backend returns and the agent-service / frontend consume, for a leaner, consistent wire contract. This is a focused re-do of apache#5927 cut directly from `main` (no foundation stack): it changes only the execution result/error model and its consumers. - Replace the flat `OperatorInfo` with `OperatorExecutionSummary` (orthogonal sub-summaries: `state`, `errorMessages`, `resultSummary?`, `consoleLogsSummary?`); rename `SyncExecutionResult` → `WorkflowExecutionSummary`. - `resultSummary.sampleTuples` is now `SampleRow[]` (`{ rowIndex, tuple }`) instead of JSON rows with an embedded `__row_index__`; drop the table-shape types (the agent derives input-port shapes from the DAG). - Move `WorkflowFatalError` into `types/execution.ts` and reuse it for per-operator errors — the same type the workflow-compiling service returns for compilation errors, so compile and execution errors share one wire shape; `api/compile-api.ts` re-exports it so its existing importers are unchanged. - `errorMessages` / `errors` are non-optional (empty = none); drop `compilationErrors`; collapse the console-message types and derive warnings from `WARNING:`-titled messages. - Operator results are still pulled on demand via `GET /agents/:id/operator-results` (transport unchanged); that REST payload now carries the canonical `OperatorExecutionSummary`, and the frontend maps it to its flat display type (re-flattening `sampleTuples` so the display components are unchanged). Touches the Scala producer (`SyncExecutionResource`), the agent-service consumers (`result-formatting`, `workflow-execution-tools`, `workflow-result-state`, `server`), and the frontend mapping. Representation/type-level; behavior preserved (input-port shape lines are now derived rather than explicitly rendered). ### Any related issues, documentation, discussions? Closes apache#5750 Part of apache#5747. Supersedes apache#5927. ### How was this PR tested? - agent-service: `tsc --noEmit` clean, `bun test` 110/110 pass, `prettier --check` clean. - The Scala producer (`SyncExecutionResource`) is unchanged from apache#5927, which verified it via `sbt WorkflowExecutionService/compile` and a full-stack end-to-end run (a Claude Haiku 4.5 agent built and executed a CSV workflow; `/operator-results` returned the new shape — `resultSummary.sampleTuples: [{ rowIndex, tuple }]`, `errorMessages: []`). ### Was this PR authored or co-authored using generative AI tooling? Generated-by: Claude Opus 4.8 (1M context)

Add unit tests for the redesigned sync-execution result/error model to bring patch coverage to 100%: - workflow-execution-tools: drive executeOperatorAndFormat and createExecuteOperatorTool across pre-flight guards, successful runs (shape/warnings/gaps/cell types/truncation), execution failures (FAILED/KILLED/ERROR, per-operator and general errors), abort propagation, and callback failures. - texera-agent: exercise getFormattedResultsForDAG over the visible result branch. - workflow-result-state: cover getOperatorInfo. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Wrap long tuple/list expressions to the 100-column limit and drop a stray blank line so scalafmtCheckAll passes. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

- agent-interaction: unit-test the visualization html caching, column and row derivation (ellipsis on index gaps) via direct construction with stubbed services. - result-table-frame: cover setupResultTable populating currentResult, columns, and totalNumTuples for non-empty data. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Add SyncExecutionResourceSpec covering the unit-testable changed lines: the new result/error summary case classes and handleExecutionError (all compilation-error branches plus the unknown-error fallback), via PrivateMethodTester so no production visibility changes are needed. The remaining changed lines (executeWorkflowSync orchestration, collectOperatorInfos, and the collectOperatorResult truncation loop) drive a live Pekko execution + DB/Iceberg-backed result documents and are exercised by the integration suite, which is out of unit-test scope. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…rs for testing Extract two behavior-preserving pure helpers so the result sampling and per-operator summary logic can be unit-tested without a live engine: - sampleAndTruncateTuples(tupleIterator, totalCount, ...) — the symmetric result truncation / sampling previously inlined in collectOperatorResult. - buildOperatorExecutionSummary(...) — the per-operator summary + console error extraction previously inlined in collectOperatorInfos. Both are pure code moves (identical expressions); call sites delegate to them. Expand SyncExecutionResourceSpec to cover every branch of both (empty/visualization/front-only/oversized/sliding-window truncation, and result/console-error/no-result summaries). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…ting Factor the final WorkflowExecutionSummary assembly (fatal-error formatting, operator-console-error detection, state string, and success determination) out of executeWorkflowSync into a pure assembleExecutionSummary helper. Behavior-preserving code move; the live observable-wait / termination handling stays inline. Add unit tests covering the state/success derivation across terminal, console-error, target-results-override, operator-error, and fatal-error cases (SyncExecutionResourceSpec now 29 tests). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

github-actions Bot assigned bobbai00 Jun 29, 2026

github-actions Bot added engine refactor Refactor the code frontend Changes related to the frontend GUI agent-service labels Jun 29, 2026

bobbai00 mentioned this pull request Jun 29, 2026

refactor(agent-service): redesign sync-execution result and error model #5927

Closed

bobbai00 marked this pull request as draft June 29, 2026 01:00

bobbai00 force-pushed the refactor/sync-execution-result-model branch from 89eb9e9 to 84022c9 Compare June 29, 2026 04:56

bobbai00 marked this pull request as ready for review June 29, 2026 08:25

bobbai00 added 5 commits July 1, 2026 23:36

refactor(frontend): consume execution summaries directly

d744bdb

refactor(agent-service): keep operator result summary name

8484d66

refactor: align execution result summary contract

a3b179d

refactor: remove legacy result marker fields

66d9785

bobbai00 force-pushed the refactor/sync-execution-result-model branch from 43166b9 to 66d9785 Compare July 2, 2026 09:18

bobbai00 requested a review from Yicong-Huang July 2, 2026 09:25

bobbai00 and others added 6 commits July 2, 2026 02:39

style(execution-service): apply scalafmt to SyncExecutionResource

2d8e9c4

Wrap long tuple/list expressions to the 100-column limit and drop a stray blank line so scalafmtCheckAll passes. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

refactor(agent-service): redesign sync-execution result and error model#6009

refactor(agent-service): redesign sync-execution result and error model#6009
bobbai00 wants to merge 11 commits into
apache:mainfrom
bobbai00:refactor/sync-execution-result-model

bobbai00 commented Jun 29, 2026

Uh oh!

github-actions Bot commented Jun 29, 2026 •

edited

Loading

Uh oh!

codecov-commenter commented Jun 29, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 29, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

bobbai00 commented Jun 29, 2026

What changes were proposed in this PR?

Any related issues, documentation, discussions?

How was this PR tested?

Was this PR authored or co-authored using generative AI tooling?

Uh oh!

github-actions Bot commented Jun 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Automated Reviewer Suggestions

Uh oh!

codecov-commenter commented Jun 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

github-actions Bot commented Jun 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

⚠️ Benchmark changes need a look

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

github-actions Bot commented Jun 29, 2026 •

edited

Loading

codecov-commenter commented Jun 29, 2026 •

edited

Loading

github-actions Bot commented Jun 29, 2026 •

edited

Loading